feat(ai): add $ai_completion_id and $ai_provider_metadata to OpenAI events#3306
Conversation
|
@johnsykim is attempting to deploy a commit to the PostHog Team on Vercel. A member of the Team first needs to authorize it. |
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
packages/ai/tests/openai-utils.test.ts:29-38
These two test cases each contain multiple inline assertions that should be parameterized. The project explicitly prefers `it.each` for multi-case checks, and the `extractRequestId` tests in the same file already follow this pattern correctly.
```suggestion
it.each([
[{ systemFingerprint: 'fp_1' }, { system_fingerprint: 'fp_1' }],
[{ requestId: 'req_1' }, { request_id: 'req_1' }],
])('omits keys whose value is missing: %p → %p', (input, expected) => {
expect(buildProviderMetadata(input)).toEqual(expected)
})
it.each([
[{}],
[{ systemFingerprint: undefined, requestId: undefined }],
[{ systemFingerprint: null, requestId: null }],
])('returns undefined when there is nothing to report: %p', (input) => {
expect(buildProviderMetadata(input)).toBeUndefined()
})
```
Reviews (4): Last reviewed commit: "feat(ai): add $ai_completion_id and $ai_..." | Re-trigger Greptile |
|
@greptile review |
|
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
|
@haacked @danielbachhuber Sorry for random tagging (please tell me if there is a better PostHog POC to reach out to). What's the best way to get PostHog team's attention on this PR? Is there a certain review and release process I have to follow? |
|
@johnsykim Hi! I'm no longer with PostHog but, based on the history of the files, @richardsolomou, @carlos-marchal-ph, or @Radu-Raicea might be able to give you a review. |
|
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
richardsolomou
left a comment
There was a problem hiding this comment.
Thanks for putting this together — the motivation (correlating PostHog events to OpenAI's logs dashboard) is a real pain point and the implementation is clean. Chatted with the team about the schema shape, and we'd like to adjust the approach before merging.
Design ask
- Move OpenAI-specific fields under a
$ai_provider_metadatablob —$ai_*has been provider-agnostic so far, andsystem_fingerprint/request_idare OpenAI-only concepts. Rather than living at the top level, they should go under a new$ai_provider_metadataproperty so each provider wrapper can surface its own metadata without polluting the shared namespace. Something like:This also sets the pattern for Anthropic / Gemini wrappers to follow later.$ai_provider_metadata: { system_fingerprint: '...', request_id: '...', }
- Keep
$ai_completion_idat the top level — response IDs generalize cleanly (Anthropic and Gemini both have equivalents), so this one belongs in the shared schema.
On the app side: we think rendering a link to the OpenAI log from the event inspector is valuable and we'll pick that up separately once this is merged!
Blocking
- Build fails with TS2353 on
packages/ai/tests/openai.test.ts:292—_request_idisn't a public property ofChatCompletion, so the literal is rejected under strict TS.pnpm buildfails on this branch and passes onmain. CI for external PRs only runs Wiz/Graphite/Vercel, which is why this slipped past. Easiest fix:let mockOpenAiChatResponse: ChatCompletion & { _request_id?: string } = {} as ChatCompletion
Suggestions
- Duplicated
(result as any)._request_idpattern — appears 6 times acrosspackages/ai/src/openai/index.tsandpackages/ai/src/openai/azure.ts. A small helper (extractRequestId(result)) would consolidate the cast and the explanatory comment in one place. - Streaming paths don't capture
requestId— onlycompletionId/systemFingerprintare pulled from chunks. The OpenAI SDK exposes_request_idon the aggregated stream response (via.withResponse()), so there's room to capture it for streams too. Fine as a follow-up.
Let me know if anything's unclear — happy to answer questions as you work through the restructure.
|
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
|
Thanks a lot @richardsolomou. I'll raise a revision sometime soon. |
|
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
|
Hi @johnsykim, taking this over from @richardsolomou! I agree with Richard's review. I'm going to mark this as a draft in the meantime, un-draft it yourself when it's ready for us again, and we'll get pinged to take a second look :) |
|
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
…vents
Adds two new auto-captured properties to `$ai_generation` events emitted by
the OpenAI and Azure OpenAI wrappers, enabling direct correlation between
PostHog events and OpenAI's Logs dashboard
(`platform.openai.com/logs/{completion_id}`):
- `$ai_completion_id` — provider-assigned response ID (e.g. `chatcmpl-…`,
`resp_…`). Generalises across providers, so it lives at the top of the
`$ai_*` namespace.
- `$ai_provider_metadata` — OpenAI-specific fields (`system_fingerprint`,
`request_id`) collected under a single blob rather than polluting the
shared schema. Sets the pattern for Anthropic / Gemini wrappers to follow.
Both options are accepted by the public `captureAiGeneration` primitive, so
external instrumentation produces identical events. Coverage spans
non-streaming and streaming Chat Completions plus Responses API
(`create` + `parse`) for both OpenAI and Azure OpenAI. Streaming paths
extract `id` / `system_fingerprint` from accumulated chunks; the
`x-request-id` header is read via the OpenAI SDK's semi-private
`_request_id` field through a small typed helper. Streaming
`request_id` capture (which needs `.withResponse()`) is left as a
follow-up — see review thread on PostHog#3306.
Addresses review feedback on PostHog#3306:
- restructure `system_fingerprint` / `request_id` under `$ai_provider_metadata`
- fix the TS2353 build error on the test mock by widening `ChatCompletion`
with `{ _request_id?: string }`
- consolidate the duplicated `(result as any)._request_id` cast into
`extractRequestId` with an explanatory comment in one place
- add Responses API `$ai_completion_id` / `request_id` test coverage
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ce4c721 to
bf5b13a
Compare
|
@carlos-marchal-ph @richardsolomou Revision is up, ready for review again. Brief summary of what changed:
Full revision notes in the updated PR description above. Thanks for hanging in there while I got around to this! |
|
@greptile review |
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
packages/ai/tests/openai-utils.test.ts:29-38
**Non-parameterised test blocks bundle multiple independent cases**
The `buildProviderMetadata` tests pack multiple distinct inputs into the same `it` block. Per the repo's simplicity rule ("we always prefer parameterised tests"), each input/output pair should be its own case in an `it.each` table. As written, a failure only points to the block, not the specific input that broke.
For example, "omits keys whose value is missing" tests `{ systemFingerprint }` and `{ requestId }` separately, and "returns undefined when there is nothing to report" tests `{}`, `{ …undefined }`, and `{ …null }` — these are three distinct edge-cases that would be clearer as three rows in an `it.each`.
### Issue 2 of 2
packages/ai/src/openai/index.ts:287-305
**Error capture path in streaming skips `completionId` even when it is available**
`completionIdFromResponse` and `systemFingerprintFromResponse` are declared in the outer scope and may have been populated by one or more chunks before the exception fires. However, the `catch` block that emits the error event never passes these values. When the stream fails mid-flight (after the first chunk carrying `chunk.id`), the error event has no `$ai_completion_id`, making it impossible to correlate the error against OpenAI's Logs dashboard — the exact use-case this PR enables.
The same gap exists in the equivalent streaming catch block in `azure.ts`.
Reviews (5): Last reviewed commit: "feat(ai): add $ai_completion_id and $ai_..." | Re-trigger Greptile |
- Streaming error path now surfaces accumulated completion metadata. When a stream fails mid-flight after consuming chunks that carried `chunk.id` / `chunk.system_fingerprint`, the error event was being emitted without `$ai_completion_id` / `$ai_provider_metadata` — the exact correlation IDs this PR is meant to enable. Hoisted the accumulators above the `try` block and pass them in the `catch` capture for both `index.ts` and `azure.ts`, chat and Responses. - Convert the `openai-utils` tests to parameterised `it.each` tables per the repo's "always prefer parameterised tests" convention; each input/output pair is now its own row so a failure points to the specific case that broke.
|
Addressed both Greptile points on
@greptile review |
|
Reviews (6): Last reviewed commit: "fix(ai): address Greptile feedback on th..." | Re-trigger Greptile |
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
packages/ai/tests/openai-utils.test.ts:11-14
`extractRequestId` returns the raw value of `_request_id` via `?? undefined`, which converts `null` but **not** an empty string. So `extractRequestId({ _request_id: '' })` returns `''`. `buildProviderMetadata` then silently drops it (truthy check), so behavior is still correct — but the interaction isn't exercised anywhere in the test suite. Adding the case documents the contract explicitly and guards against a future change to the truthy check silently breaking it.
```suggestion
['returns undefined for numeric input', 42, undefined],
['returns empty string when `_request_id` is empty string', { _request_id: '' }, ''],
])('%s', (_name, input, expected) => {
expect(extractRequestId(input)).toBe(expected)
})
```
Reviews (7): Last reviewed commit: "fix(ai): address Greptile feedback on th..." | Re-trigger Greptile |
|
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
|
@carlos-marchal-ph Let me know if the PR is good to go! |
Generated-By: PostHog Code Task-Id: 672e907b-e741-4f0e-abf2-76722b296982
carlos-marchal-ph
left a comment
There was a problem hiding this comment.
Looks good, just pushed a tiny style fix and will go ahead and merge it!
Summary
Adds two new auto-captured properties to
$ai_generationevents for the OpenAI and Azure OpenAI wrappers, enabling correlation between PostHog events and OpenAI's Logs dashboard (platform.openai.com/logs/{completion_id}):$ai_completion_id— provider-assigned response ID (chatcmpl-…,resp_…). Lives at the top of the$ai_*namespace because response IDs generalize across providers.$ai_provider_metadata— OpenAI-specific fields (system_fingerprint,request_id) collected under a single blob rather than polluting the shared, provider-agnostic schema. Sets the pattern for Anthropic / Gemini wrappers to follow.Both options are now accepted by the public
captureAiGenerationprimitive, so external instrumentation produces identical events.Revision notes (rebased against the new
captureAiGenerationprimitive)This branch was originally written against
sendEventToPosthog. Since #3499 landed, every wrapper funnels throughcaptureAiGeneration, so the implementation was rewritten cleanly on top of the new primitive.Addresses Richard's review:
$ai_provider_metadata.system_fingerprintandrequest_idare OpenAI-only concepts and now live under$ai_provider_metadatainstead of at the top level.$ai_completion_idat the top level. Response IDs generalize across providers, so this stays in the shared schema.mockOpenAiChatResponseis now typed asChatCompletion & { _request_id?: string }.pnpm buildpasses.(result as any)._request_idcast. Replaced all six call sites with a singleextractRequestId(result)helper inpackages/ai/src/openai/utils.ts, with an explanatory doc comment in one place.request_idis deferred to a follow-up as suggested — it requires threading.withResponse()through the streaming flow, which is a larger refactor than this PR. Streaming chats still capture$ai_completion_idandsystem_fingerprintfrom accumulated chunks.Also addresses Greptile's outstanding comment about Responses API
request_idcoverage: theresponses.parseandresponses.createweb-search tests both now exercise the positive case.What it captures, by path
$ai_completion_id$ai_provider_metadataresult.idsystem_fingerprint,request_idchunk.idsystem_fingerprint(from chunks)create)result.idrequest_idchunk.response.idparseresult.idrequest_idSame matrix for the Azure OpenAI wrapper.
Test plan
pnpm lintcleanpnpm buildclean (TS2353 fixed)extractRequestId/buildProviderMetadatahelpers (packages/ai/tests/openai-utils.test.ts) — 11 cases including the "empty → undefined" branchbasic completionasserts$ai_completion_id+$ai_provider_metadatawith both fieldsbasic streaming completionasserts$ai_completion_id+$ai_provider_metadatawithsystem_fingerprintonlyresponses parseasserts$ai_completion_id+$ai_provider_metadata.request_idresponses.create(web-search test) asserts$ai_completion_id+$ai_provider_metadata.request_idbasic completionmirrors the OpenAI assertionsOriginal background & motivation
Why this matters
The
@posthog/aipackage wraps the OpenAI SDK and auto-emits$ai_generationevents. Withoutresponse.id, there's no way to navigate from a PostHog$ai_generationevent to the corresponding entry in OpenAI's Logs dashboard (platform.openai.com/logs/{completion_id}).response.id, joining PostHog events with OpenAI's own logging system requires multi-hop lookups via entity IDs.x-request-id— having this on the PostHog event makes it trivial to file tickets with the right correlation ID.References
https://platform.openai.com/logs?api=chat-completionschatcmpl-{base62}(Chat Completions),resp_{base62}(Responses)_request_id: attached from thex-request-idresponse header by theopenainpm package